Overview

Dataset statistics

Number of variables11
Number of observations3995
Missing cells2569
Missing cells (%)5.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory343.4 KiB
Average record size in memory88.0 B

Variable types

Categorical10
Numeric1

Alerts

newsdesk has a high cardinality: 55 distinct values High cardinality
subsection has a high cardinality: 53 distinct values High cardinality
headline has a high cardinality: 3962 distinct values High cardinality
abstract has a high cardinality: 3949 distinct values High cardinality
keywords has a high cardinality: 3701 distinct values High cardinality
pub_date has a high cardinality: 3663 distinct values High cardinality
uniqueID has a high cardinality: 3995 distinct values High cardinality
is_popular is highly correlated with section and 2 other fieldsHigh correlation
section is highly correlated with is_popular and 2 other fieldsHigh correlation
newsdesk is highly correlated with is_popular and 3 other fieldsHigh correlation
material is highly correlated with newsdesk and 1 other fieldsHigh correlation
subsection is highly correlated with is_popular and 3 other fieldsHigh correlation
newsdesk is highly correlated with section and 3 other fieldsHigh correlation
section is highly correlated with newsdesk and 3 other fieldsHigh correlation
subsection is highly correlated with newsdesk and 4 other fieldsHigh correlation
material is highly correlated with newsdesk and 2 other fieldsHigh correlation
word_count is highly correlated with subsectionHigh correlation
is_popular is highly correlated with newsdesk and 2 other fieldsHigh correlation
subsection has 2569 (64.3%) missing values Missing
headline is uniformly distributed Uniform
abstract is uniformly distributed Uniform
pub_date is uniformly distributed Uniform
uniqueID is uniformly distributed Uniform
uniqueID has unique values Unique
word_count has 121 (3.0%) zeros Zeros

Reproduction

Analysis started2021-11-23 03:27:02.972648
Analysis finished2021-11-23 03:27:10.437347
Duration7.46 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

newsdesk
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct55
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
OpEd
440 
Culture
277 
Washington
 
231
Foreign
 
207
Science
 
205
Other values (50)
2635 

Length

Max length15
Median length7
Mean length7.063078849
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.2%

Sample

1st rowOpEd
2nd rowOpEd
3rd rowOpEd
4th rowGames
5th rowSports

Common Values

ValueCountFrequency (%)
OpEd440
 
11.0%
Culture277
 
6.9%
Washington231
 
5.8%
Foreign207
 
5.2%
Science205
 
5.1%
Business201
 
5.0%
Learning189
 
4.7%
Metro180
 
4.5%
Politics179
 
4.5%
Sports165
 
4.1%
Other values (45)1721
43.1%

Length

2021-11-22T21:27:10.529149image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oped440
 
11.0%
culture277
 
6.9%
washington231
 
5.8%
foreign207
 
5.2%
science205
 
5.1%
business204
 
5.1%
learning189
 
4.7%
metro180
 
4.5%
politics179
 
4.5%
sports165
 
4.1%
Other values (46)1739
43.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

section
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct36
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
U.S.
604 
Opinion
494 
Arts
273 
New York
229 
World
226 
Other values (31)
2169 

Length

Max length20
Median length7
Mean length7.579474343
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st rowOpinion
2nd rowOpinion
3rd rowOpinion
4th rowCrosswords & Games
5th rowSports

Common Values

ValueCountFrequency (%)
U.S.604
15.1%
Opinion494
 
12.4%
Arts273
 
6.8%
New York229
 
5.7%
World226
 
5.7%
The Learning Network198
 
5.0%
Business Day196
 
4.9%
Sports170
 
4.3%
Real Estate162
 
4.1%
Well148
 
3.7%
Other values (26)1295
32.4%

Length

2021-11-22T21:27:10.683161image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
u.s604
 
11.2%
opinion494
 
9.1%
the289
 
5.4%
arts273
 
5.1%
new229
 
4.2%
york229
 
4.2%
world226
 
4.2%
learning198
 
3.7%
network198
 
3.7%
business196
 
3.6%
Other values (37)2464
45.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

subsection
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct53
Distinct (%)3.7%
Missing2569
Missing (%)64.3%
Memory size31.3 KiB
Politics
373 
Television
110 
Europe
79 
The Daily
78 
Music
 
59
Other values (48)
727 

Length

Max length22
Median length8
Mean length8.683730715
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.8%

Sample

1st rowPro Football
2nd rowPolitics
3rd rowTelevision
4th rowMind
5th rowWine, Beer & Cocktails

Common Values

ValueCountFrequency (%)
Politics373
 
9.3%
Television110
 
2.8%
Europe79
 
2.0%
The Daily78
 
2.0%
Music59
 
1.5%
Sunday Review58
 
1.5%
Family58
 
1.5%
Asia Pacific57
 
1.4%
Art & Design48
 
1.2%
Pro Football47
 
1.2%
Other values (43)459
 
11.5%
(Missing)2569
64.3%

Length

2021-11-22T21:27:10.845369image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
politics374
19.2%
television110
 
5.6%
review93
 
4.8%
europe79
 
4.1%
78
 
4.0%
the78
 
4.0%
daily78
 
4.0%
pro64
 
3.3%
music59
 
3.0%
family58
 
3.0%
Other values (60)878
45.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

material
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
News
3154 
Op-Ed
456 
Interactive Feature
 
121
Review
 
115
briefing
 
60
Other values (4)
 
89

Length

Max length19
Median length4
Mean length4.877596996
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOp-Ed
2nd rowOp-Ed
3rd rowOp-Ed
4th rowNews
5th rowNews

Common Values

ValueCountFrequency (%)
News3154
78.9%
Op-Ed456
 
11.4%
Interactive Feature121
 
3.0%
Review115
 
2.9%
briefing60
 
1.5%
Obituary (Obit)47
 
1.2%
Editorial29
 
0.7%
News Analysis11
 
0.3%
Letter2
 
0.1%

Length

2021-11-22T21:27:11.008624image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-22T21:27:11.109632image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
news3165
75.8%
op-ed456
 
10.9%
interactive121
 
2.9%
feature121
 
2.9%
review115
 
2.8%
briefing60
 
1.4%
obituary47
 
1.1%
obit47
 
1.1%
editorial29
 
0.7%
analysis11
 
0.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

headline
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3962
Distinct (%)99.2%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
Variety: Acrostic
 
6
Homes for Sale in New York and New Jersey
 
6
Homes for Sale in New York and Connecticut
 
6
What the Heck Is That?
 
4
Homes for Sale in Brooklyn, Queens and Manhattan
 
4
Other values (3957)
3969 

Length

Max length123
Median length56
Mean length53.25957447
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3947 ?
Unique (%)98.8%

Sample

1st rowAnyone Else Want to See Trump ‘Shut Up’?
2nd rowTrump Calls on Extremists to ‘Stand By’
3rd rowCan Mike Espy Make History, Again?
4th rowIn Which Rikishi Wear Mawashi
5th rowN.F.L. Week 4 Predictions: Our Picks Against the Spread

Common Values

ValueCountFrequency (%)
Variety: Acrostic6
 
0.2%
Homes for Sale in New York and New Jersey6
 
0.2%
Homes for Sale in New York and Connecticut6
 
0.2%
What the Heck Is That?4
 
0.1%
Homes for Sale in Brooklyn, Queens and Manhattan4
 
0.1%
The Crossword Stumper3
 
0.1%
Homes for Sale in Brooklyn, Manhattan and Queens3
 
0.1%
$1.6 Million Homes in California2
 
0.1%
Homes for Sale in Brooklyn, Manhattan and the Bronx2
 
0.1%
Homes for Sale in Brooklyn, Manhattan and Staten Island2
 
0.1%
Other values (3952)3957
99.0%

Length

2021-11-22T21:27:11.317647image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the1580
 
4.4%
a1018
 
2.8%
to836
 
2.3%
in774
 
2.2%
of718
 
2.0%
and587
 
1.6%
for483
 
1.3%
is383
 
1.1%
trump297
 
0.8%
how284
 
0.8%
Other values (8309)28954
80.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

abstract
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3949
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
What is this image saying?
 
11
Look closely at this image, stripped of its caption, and join the moderated conversation about what you and other students see.
 
10
Teenage comments in response to our recent writing prompts, and an invitation to join the ongoing conversation.
 
9
What story does this image inspire for you?
 
8
Our critics and writers have selected noteworthy cultural events to experience virtually or in person in New York City.
 
4
Other values (3944)
3953 

Length

Max length626
Median length132
Mean length129.5048811
Min length18

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3938 ?
Unique (%)98.6%

Sample

1st rowOur president as a terrible toddler.
2nd rowInstead of condemning violent groups, the president marshals them.
3rd rowIf the Democratic Party claims to value Black support, then they should work harder to make it happen.
4th rowAdam Fromm is on the line.
5th rowTom Brady and the Buccaneers are building momentum and the Bears hope to continue an improbable start. Two games — Chiefs-Patriots and Titans-Steelers — have been postponed.

Common Values

ValueCountFrequency (%)
What is this image saying?11
 
0.3%
Look closely at this image, stripped of its caption, and join the moderated conversation about what you and other students see.10
 
0.3%
Teenage comments in response to our recent writing prompts, and an invitation to join the ongoing conversation.9
 
0.2%
What story does this image inspire for you?8
 
0.2%
Our critics and writers have selected noteworthy cultural events to experience virtually or in person in New York City.4
 
0.1%
A look at one of the entries that fooled solvers in last week’s puzzles.3
 
0.1%
A look at one of the entries from last week’s puzzles that stumped our solvers.3
 
0.1%
Recent residential sales in New York City and the region.3
 
0.1%
Trees appear to communicate and cooperate through subterranean networks of fungi. What are they sharing with one another?2
 
0.1%
We chart the trials of a tavern in Oakland, Calif., that was thriving until the pandemic brought economic and emotional turmoil.2
 
0.1%
Other values (3939)3940
98.6%

Length

2021-11-22T21:27:11.519681image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the5037
 
5.9%
a2517
 
3.0%
of2287
 
2.7%
to2259
 
2.7%
and2193
 
2.6%
in1869
 
2.2%
for850
 
1.0%
is740
 
0.9%
that654
 
0.8%
are638
 
0.7%
Other values (13625)66138
77.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

keywords
Categorical

HIGH CARDINALITY

Distinct3701
Distinct (%)92.6%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
[]
 
205
['Crossword Puzzles']
 
53
['New York City']
 
14
['Television', 'Fargo (TV Program)']
 
8
['Customs, Etiquette and Manners', 'Content Type: Service']
 
4
Other values (3696)
3711 

Length

Max length1381
Median length166
Mean length176.1924906
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3684 ?
Unique (%)92.2%

Sample

1st row['Presidential Election of 2020', 'Biden, Joseph R Jr', 'Trump, Donald J', 'Debates (Political)']
2nd row['Presidential Election of 2020', 'United States Politics and Government', 'Right-Wing Extremism and Alt-Right', 'Fringe Groups and Movements', 'Whites', 'Debates (Political)', 'Demonstrations, Protests and Riots', 'Trump, Donald J', 'United States']
3rd row['Black People', 'Blacks', 'Presidential Election of 2020', 'United States Politics and Government', 'State Legislatures', 'Elections, Senate', 'Democratic Party', 'Republican Party', 'Senate', 'Espy, Mike', 'Mississippi']
4th row['Crossword Puzzles']
5th row['Football', 'New England Patriots', 'Kansas City Chiefs', 'Los Angeles Chargers', 'Tampa Bay Buccaneers', 'Indianapolis Colts', 'Chicago Bears', 'Buffalo Bills', 'Las Vegas Raiders', 'Tennessee Titans', 'Pittsburgh Steelers', 'Mahomes, Patrick (1995- )', 'Baltimore Ravens']

Common Values

ValueCountFrequency (%)
[]205
 
5.1%
['Crossword Puzzles']53
 
1.3%
['New York City']14
 
0.4%
['Television', 'Fargo (TV Program)']8
 
0.2%
['Customs, Etiquette and Manners', 'Content Type: Service']4
 
0.1%
['Football', 'National Football League']3
 
0.1%
['Television', 'The Mandalorian (TV Program)']3
 
0.1%
['Customs, Etiquette and Manners']3
 
0.1%
['Presidential Election of 2020', 'United States Politics and Government', 'Biden, Joseph R Jr', 'Trump, Donald J']2
 
0.1%
['internal-essential']2
 
0.1%
Other values (3691)3698
92.6%

Length

2021-11-22T21:27:11.694695image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and6504
 
7.9%
of1357
 
1.6%
states1342
 
1.6%
united1302
 
1.6%
coronavirus1261
 
1.5%
2019-ncov977
 
1.2%
politics971
 
1.2%
government952
 
1.2%
2020943
 
1.1%
election889
 
1.1%
Other values (8747)66055
80.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

word_count
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1691
Distinct (%)42.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1247.067084
Minimum0
Maximum15619
Zeros121
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size31.3 KiB
2021-11-22T21:27:11.868235image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile214
Q1876.5
median1188
Q31481
95-th percentile2355.5
Maximum15619
Range15619
Interquartile range (IQR)604.5

Descriptive statistics

Standard deviation815.9134971
Coefficient of variation (CV)0.6542659234
Kurtosis42.38546579
Mean1247.067084
Median Absolute Deviation (MAD)305
Skewness4.260082579
Sum4982033
Variance665714.8348
MonotonicityNot monotonic
2021-11-22T21:27:12.025761image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0121
 
3.0%
89312
 
0.3%
125510
 
0.3%
127710
 
0.3%
11909
 
0.2%
12529
 
0.2%
11369
 
0.2%
9099
 
0.2%
8968
 
0.2%
13378
 
0.2%
Other values (1681)3790
94.9%
ValueCountFrequency (%)
0121
3.0%
161
 
< 0.1%
1032
 
0.1%
1141
 
< 0.1%
1161
 
< 0.1%
1241
 
< 0.1%
1263
 
0.1%
1281
 
< 0.1%
1302
 
0.1%
1312
 
0.1%
ValueCountFrequency (%)
156191
< 0.1%
104961
< 0.1%
84231
< 0.1%
83841
< 0.1%
82231
< 0.1%
78151
< 0.1%
77591
< 0.1%
76771
< 0.1%
75501
< 0.1%
75181
< 0.1%

pub_date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3663
Distinct (%)91.7%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
2020-10-14 09:00:29+00:00
 
5
2020-11-13 10:00:21+00:00
 
5
2020-11-20 10:00:25+00:00
 
4
2020-12-23 10:00:32+00:00
 
4
2020-11-16 10:00:08+00:00
 
4
Other values (3658)
3973 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3407 ?
Unique (%)85.3%

Sample

1st row2020-10-01 00:05:51+00:00
2nd row2020-10-01 00:43:28+00:00
3rd row2020-10-01 00:45:17+00:00
4th row2020-10-01 02:00:05+00:00
5th row2020-10-01 04:01:16+00:00

Common Values

ValueCountFrequency (%)
2020-10-14 09:00:29+00:005
 
0.1%
2020-11-13 10:00:21+00:005
 
0.1%
2020-11-20 10:00:25+00:004
 
0.1%
2020-12-23 10:00:32+00:004
 
0.1%
2020-11-16 10:00:08+00:004
 
0.1%
2020-11-12 10:00:29+00:004
 
0.1%
2020-11-24 10:00:21+00:004
 
0.1%
2020-11-04 10:00:18+00:004
 
0.1%
2020-12-11 10:00:25+00:004
 
0.1%
2020-12-23 10:00:11+00:004
 
0.1%
Other values (3653)3953
98.9%

Length

2021-11-22T21:27:12.171829image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-10-2872
 
0.9%
2020-12-2372
 
0.9%
2020-12-1570
 
0.9%
2020-11-0270
 
0.9%
2020-12-0968
 
0.9%
2020-12-0266
 
0.8%
2020-10-1366
 
0.8%
2020-10-3065
 
0.8%
2020-11-1065
 
0.8%
2020-11-2364
 
0.8%
Other values (2380)7312
91.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

is_popular
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
0
2064 
1
1931 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
02064
51.7%
11931
48.3%

Length

2021-11-22T21:27:12.295838image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-22T21:27:12.361843image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
02064
51.7%
11931
48.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

uniqueID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct3995
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size31.3 KiB
nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f
 
1
nyt://article/ef3d1eb3-6297-5711-bb2a-27bee6fc6831
 
1
nyt://article/ca3eaa6a-be4d-5a0d-b368-070772436120
 
1
nyt://article/fe6bcb27-806e-5b82-bf26-fd954254c1a5
 
1
nyt://article/bef36e56-8080-5278-80a9-8927bee7c0b3
 
1
Other values (3990)
3990 

Length

Max length54
Median length50
Mean length50.12115144
Min length50

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3995 ?
Unique (%)100.0%

Sample

1st rownyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f
2nd rownyt://article/9a7ef9e0-1334-56b2-a7f1-288c48873b06
3rd rownyt://article/4bb2b763-0088-5e10-b204-19e404f744ec
4th rownyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a
5th rownyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb

Common Values

ValueCountFrequency (%)
nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f1
 
< 0.1%
nyt://article/ef3d1eb3-6297-5711-bb2a-27bee6fc68311
 
< 0.1%
nyt://article/ca3eaa6a-be4d-5a0d-b368-0707724361201
 
< 0.1%
nyt://article/fe6bcb27-806e-5b82-bf26-fd954254c1a51
 
< 0.1%
nyt://article/bef36e56-8080-5278-80a9-8927bee7c0b31
 
< 0.1%
nyt://article/cf01199e-51f2-5a0d-b32a-6ec7ade4ff291
 
< 0.1%
nyt://article/59d2cf49-531e-5263-a738-1c4488c0fb841
 
< 0.1%
nyt://article/a0339f93-7ae8-5840-a825-642ab1f2ba021
 
< 0.1%
nyt://article/9caf41fa-de3b-5bfa-8df9-290b41f1ad871
 
< 0.1%
nyt://article/edfb7f02-ac48-5b65-ad7f-062e6cd361891
 
< 0.1%
Other values (3985)3985
99.7%

Length

2021-11-22T21:27:12.482852image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f1
 
< 0.1%
nyt://article/f4c9673b-8374-518b-b0a1-25cfae0921d01
 
< 0.1%
nyt://article/c3295d25-e296-571a-a3dd-c7f56f317c321
 
< 0.1%
nyt://article/138f58dc-301a-586a-bb94-8c010d0e789f1
 
< 0.1%
nyt://article/4bb2b763-0088-5e10-b204-19e404f744ec1
 
< 0.1%
nyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a1
 
< 0.1%
nyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb1
 
< 0.1%
nyt://article/27e40157-1790-59fc-8153-11cc889501521
 
< 0.1%
nyt://article/db8a2622-8509-5c2a-a8fe-6cb1ec8d09891
 
< 0.1%
nyt://article/c1695e32-4822-51aa-958c-52d9ebacabcd1
 
< 0.1%
Other values (3985)3985
99.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-22T21:27:07.889085image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Correlations

2021-11-22T21:27:12.600862image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-22T21:27:12.740757image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-22T21:27:12.876770image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-22T21:27:13.006776image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-22T21:27:13.138786image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-22T21:27:09.882051image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-22T21:27:10.192327image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-22T21:27:10.319338image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

newsdesksectionsubsectionmaterialheadlineabstractkeywordsword_countpub_dateis_popularuniqueID
0OpEdOpinionNaNOp-EdAnyone Else Want to See Trump ‘Shut Up’?Our president as a terrible toddler.['Presidential Election of 2020', 'Biden, Joseph R Jr', 'Trump, Donald J', 'Debates (Political)']9252020-10-01 00:05:51+00:001nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f
1OpEdOpinionNaNOp-EdTrump Calls on Extremists to ‘Stand By’Instead of condemning violent groups, the president marshals them.['Presidential Election of 2020', 'United States Politics and Government', 'Right-Wing Extremism and Alt-Right', 'Fringe Groups and Movements', 'Whites', 'Debates (Political)', 'Demonstrations, Protests and Riots', 'Trump, Donald J', 'United States']9022020-10-01 00:43:28+00:001nyt://article/9a7ef9e0-1334-56b2-a7f1-288c48873b06
2OpEdOpinionNaNOp-EdCan Mike Espy Make History, Again?If the Democratic Party claims to value Black support, then they should work harder to make it happen.['Black People', 'Blacks', 'Presidential Election of 2020', 'United States Politics and Government', 'State Legislatures', 'Elections, Senate', 'Democratic Party', 'Republican Party', 'Senate', 'Espy, Mike', 'Mississippi']14122020-10-01 00:45:17+00:001nyt://article/4bb2b763-0088-5e10-b204-19e404f744ec
3GamesCrosswords & GamesNaNNewsIn Which Rikishi Wear MawashiAdam Fromm is on the line.['Crossword Puzzles']8492020-10-01 02:00:05+00:001nyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a
4SportsSportsPro FootballNewsN.F.L. Week 4 Predictions: Our Picks Against the SpreadTom Brady and the Buccaneers are building momentum and the Bears hope to continue an improbable start. Two games — Chiefs-Patriots and Titans-Steelers — have been postponed.['Football', 'New England Patriots', 'Kansas City Chiefs', 'Los Angeles Chargers', 'Tampa Bay Buccaneers', 'Indianapolis Colts', 'Chicago Bears', 'Buffalo Bills', 'Las Vegas Raiders', 'Tennessee Titans', 'Pittsburgh Steelers', 'Mahomes, Patrick (1995- )', 'Baltimore Ravens']26902020-10-01 04:01:16+00:000nyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb
5LearningThe Learning NetworkNaNNewsConfrontationWhat story does this image inspire for you?[]1612020-10-01 07:00:02+00:000nyt://article/27e40157-1790-59fc-8153-11cc88950152
6PoliticsU.S.PoliticsNewsFor Voters Still Mulling, One Thing Is Clear: That Debate Didn’t HelpA small but crucial segment of likely voters say they remain uncommitted — to a candidate or to voting at all — and nothing they heard on Tuesday clinched things for them.['Presidential Election of 2020', 'Debates (Political)', 'Voting and Voters', 'Democratic Party', 'Republican Party', 'Trump, Donald J', 'Biden, Joseph R Jr']12702020-10-01 07:00:06+00:001nyt://article/db8a2622-8509-5c2a-a8fe-6cb1ec8d0989
7CultureArtsTelevisionNewsAfter ‘The Salisbury Poisonings,’ Locals Picked Up the PiecesA new AMC show dramatizes the 2018 poisoning of a former Russian spy in Britain. Even for a reporter who covered the real events, the four episodes contain revelations.['Television', 'The Salisbury Poisonings (TV Program)', 'Lawn, Declan', 'Patterson, Adam (Filmmaker)', 'AMC (TV Network)', 'Skripal, Sergei V', 'Poisoning and Poisons', 'Assassinations and Attempted Assassinations', 'Espionage and Intelligence Services', 'News and News Media', 'Russia', 'Great Britain', 'Sturgess, Dawn', 'Salisbury (England)']11652020-10-01 08:13:43+00:000nyt://article/c1695e32-4822-51aa-958c-52d9ebacabcd
8LearningThe Learning NetworkNaNNewsAre You Having a Tough Time Maintaining Friendships These Days?Has the pandemic brought you closer together with friends? Or moved you farther apart?[]9182020-10-01 09:00:03+00:001nyt://article/18dcd4cf-e1c8-5741-af33-38d1198e44e1
9MagazineMagazineNaNNewsDistance Learning, With Shades of Big BrotherA video on digital classroom etiquette makes it very clear: Your home is no longer your own, and your kids must pretend to learn in it.['E-Learning', 'Children and Childhood', 'Education (K-12)', 'Quarantine (Life and Culture)', 'Customs, Etiquette and Manners']12382020-10-01 09:00:04+00:001nyt://article/c617ba64-da38-5ee3-95d4-aa12d157d741

Last rows

newsdesksectionsubsectionmaterialheadlineabstractkeywordsword_countpub_dateis_popularuniqueID
3985RealEstateReal EstateNaNNewsHomes for Sale in New York and ConnecticutThis week’s properties include a five-bedroom in Great Neck, N.Y., and a three-bedroom in Fairfield, Conn.['Real Estate and Housing (Residential)', 'Great Neck (NY)', 'Fairfield (Conn)']1242020-12-31 14:00:20+00:000nyt://article/b0e4fe64-8bee-5a38-81c8-c5ecc595ec9a
3986RealEstateReal EstateNaNNewsHomes for Sale in Brooklyn, Manhattan and Staten IslandThis week’s properties are in Downtown Brooklyn, the Flatiron district and Grymes Hill.['Real Estate and Housing (Residential)', 'Downtown Brooklyn (Brooklyn, NY)', 'Flatiron District (Manhattan, NY)', 'Grymes Hill (Staten Island, NY)']1302020-12-31 14:00:24+00:000nyt://article/6798b89f-8926-5e39-9d72-aa5f03eb02aa
3987SportsSportsPro BasketballNewsBecky Hammon Becomes First Woman to Serve as Head Coach in N.B.A. GameShe took over coaching the San Antonio Spurs after Gregg Popovich was ejected from a game against the Los Angeles Lakers on Wednesday night.['Basketball', 'National Basketball Assn', 'Hammon, Becky', 'San Antonio Spurs', 'Los Angeles Lakers']5972020-12-31 14:30:03+00:000nyt://article/2df71ddc-ac42-54a7-9af2-84ceeba85960
3988Arts&LeisureArtsArt & DesignNewsSuperheroes and Trailblazers: Black Comic Book Artists, RediscoveredA new book examines the lives of these trailblazers, who paved the way for subsequent generations of illustrators but were invisible to the mainstream in their own time.['Art', 'Comic Books and Strips', 'Black People', 'Blacks', 'Quattro, Ken', 'Invisible Men: The Trailblazing Black Artists of Comic Books (Book)', 'Herriman, George (1880-1944)', 'Jackson, Jay Paul', 'Stoner, Elmer C', 'Greene, Sanford', 'Middleton, Owen Charles']15932020-12-31 15:00:09+00:000nyt://article/ae47f0b2-c2ba-5a89-adb9-2b7f4ce1667c
3989ScienceHealthNaNNewsHere’s Why Distribution of the Vaccine Is Taking Longer Than ExpectedHealth officials and hospitals are struggling with a lack of resources. Holiday staffing and saving doses for nursing homes are also contributing to delays.['Vaccination and Immunization', 'Coronavirus (2019-nCoV)', 'Public-Private Sector Cooperation', 'States (US)', 'your-feed-healthcare']15702020-12-31 15:26:56+00:001nyt://article/5320a2e9-d739-542a-a397-443c43231527
3990EditorialOpinionNaNOp-EdWhat It Takes to Heal From Covid-19Survivors can get better, but they need help.['Chronic Condition (Health)', 'Coronavirus (2019-nCoV)', 'Health Insurance and Managed Care']10022020-12-31 15:27:47+00:001nyt://article/e8adbb75-a8b3-5a8c-886b-b9c1195f607b
3991SportsSportsBaseballNewsPadres Jolt M.L.B. With Bold Moves to Set Up World Series RunWhile many teams continued to assess the financial consequences of the coronavirus pandemic, San Diego acquired two pricey pitchers and instantly became one of the favorites to win the World Series.['San Diego Padres', 'Major League Baseball', 'Free Agents (Sports)', 'Trades (Sports)', 'Darvish, Yu', 'Snell, Blake (1992- )']11002020-12-31 15:47:44+00:000nyt://article/1f11417d-2c57-51b9-b75d-8f67f0a98ba9
3992BusinessBusiness DayNaNNewsTheir Finances Ravaged, Customers Fear Banks Will Withhold Stimulus ChecksBanks have the power to decide whether to let overdrawn customers gain access to the stimulus money being deposited into their accounts, but they have taken different approaches.['Banking and Financial Institutions', 'Coronavirus Aid, Relief, and Economic Security Act (2020)', 'Stimulus (Economic)', 'Prices (Fares, Fees and Rates)', 'Personal Finances']14292020-12-31 16:21:40+00:001nyt://article/c4b9edab-bdde-5d81-b496-06fedb527c39
3993DiningFoodWine, Beer & CocktailsNewsShould Wine Be Among Your Health Resolutions?The new category of ‘clean wines’ is an effort to appeal to those seeking wellness. But why try to rationalize wine as a healthful product?['Wines', 'Grapes', 'Diet and Nutrition', 'Diaz, Cameron', 'Power, Katherine (1980- )', 'Avaline Ltd']13072020-12-31 17:28:11+00:001nyt://article/efcaf652-ffad-5b4e-9f17-4fd9aff5b1ba
3994BusinessTechnologyNaNNewsMicrosoft Says Russian Hackers Viewed Some of Its Source CodeThe hackers gained more access than the company previously revealed, though the attackers were unable to modify code or access emails.['Microsoft Corp', 'US Federal Government Data Breach (2020)', 'Cyberwarfare and Defense', 'Cyberattacks and Hackers', 'Computer Security', 'SolarWinds']3402020-12-31 18:02:02+00:001nyt://article/12048b2b-62e3-5bed-8c77-483a4299f465